Overview

Dataset statistics

Number of variables24
Number of observations1000
Missing cells5308
Missing cells (%)22.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory368.6 KiB
Average record size in memory377.4 B

Variable types

NUM18
CAT3
UNSUPPORTED2
DATE1

Reproduction

Analysis started2020-04-04 15:19:34.444816
Analysis finished2020-04-04 15:20:46.873167
Versionpandas-profiling v2.5.3
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Haltestelle has a high cardinality: 366 distinct values High cardinality
Nachste_Haltestelle has a high cardinality: 373 distinct values High cardinality
Tage_DWV is highly correlated with Tage_DTVHigh Correlation
Tage_DTV is highly correlated with Tage_DWV and 2 other fieldsHigh Correlation
Tage_SA is highly correlated with Tage_DTVHigh Correlation
Tage_SO is highly correlated with Tage_DTVHigh Correlation
Durchschnitt_Wochentag is highly correlated with Durchschnitt_TagHigh Correlation
Durchschnitt_Tag is highly correlated with Durchschnitt_Wochentag and 2 other fieldsHigh Correlation
Durchschnitt_Samstag is highly correlated with Durchschnitt_TagHigh Correlation
Durchschnitt_Sonntag is highly correlated with Besetzung and 1 other fieldsHigh Correlation
Besetzung is highly correlated with Durchschnitt_SonntagHigh Correlation
Uhrzeit has 29 (2.9%) missing values Missing
Anzahl_Messungen has 111 (11.1%) missing values Missing
Einsteiger has 93 (9.3%) missing values Missing
Aussteiger has 132 (13.2%) missing values Missing
Tage_DTV has 207 (20.7%) missing values Missing
Tage_DWV has 553 (55.3%) missing values Missing
Tage_SA has 797 (79.7%) missing values Missing
Tage_SO has 865 (86.5%) missing values Missing
Haltestelle has 23 (2.3%) missing values Missing
Haltestelle_Nummer has 23 (2.3%) missing values Missing
Nachste_Haltestelle has 23 (2.3%) missing values Missing
Nachste_Haltestelle_Nummer has 23 (2.3%) missing values Missing
Durchschnitt_Tag has 209 (20.9%) missing values Missing
Durchschnitt_Wochentag has 554 (55.4%) missing values Missing
Durchschnitt_Samstag has 797 (79.7%) missing values Missing
Durchschnitt_Sonntag has 866 (86.6%) missing values Missing
Haltestelle_Id is an unsupported type, check if it needs cleaning or further analysis Rejected
Nachste_Haltestelle_Id is an unsupported type, check if it needs cleaning or further analysis Rejected

Variables

df_index
Real number (ℝ≥0)

Distinct count995
Unique (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48687.227
Minimum391
Maximum99833
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum391
5-th percentile4484.55
Q123903.75
median46963
Q374761
95-th percentile94277.1
Maximum99833
Range99442
Interquartile range (IQR)50857.25

Descriptive statistics

Standard deviation28690.08213
Coefficient of variation (CV)0.5892732837
Kurtosis-1.173282813
Mean48687.227
Median Absolute Deviation (MAD)24659.61717
Skewness0.07697102401
Sum48687227
Variance823120812.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 391. 99833.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11421 2 0.2%
 
8322 2 0.2%
 
86181 2 0.2%
 
53982 2 0.2%
 
2845 2 0.2%
 
18429 1 0.1%
 
21178 1 0.1%
 
60157 1 0.1%
 
92743 1 0.1%
 
88761 1 0.1%
 
Other values (985) 985 98.5%
 
ValueCountFrequency (%) 
391 1 0.1%
 
424 1 0.1%
 
493 1 0.1%
 
683 1 0.1%
 
723 1 0.1%
 
ValueCountFrequency (%) 
99833 1 0.1%
 
99780 1 0.1%
 
99732 1 0.1%
 
99614 1 0.1%
 
99460 1 0.1%
 

Linie
Real number (ℝ≥0)

Distinct count72
Unique (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean83.026
Minimum2
Maximum916
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum2
5-th percentile2
Q18
median15
Q370
95-th percentile412
Maximum916
Range914
Interquartile range (IQR)62

Descriptive statistics

Standard deviation163.1348899
Coefficient of variation (CV)1.964865102
Kurtosis9.756250566
Mean83.026
Median Absolute Deviation (MAD)99.901732
Skewness3.047404826
Sum83026
Variance26612.99232
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 2. 2.5 4.5 5.5 6.5 ... 414.5 703.5 704.5 911. 916. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 86 8.6%
 
6 56 5.6%
 
32 55 5.5%
 
8 55 5.5%
 
9 54 5.4%
 
31 49 4.9%
 
10 48 4.8%
 
13 44 4.4%
 
3 36 3.6%
 
80 35 3.5%
 
Other values (62) 482 48.2%
 
ValueCountFrequency (%) 
2 86 8.6%
 
3 36 3.6%
 
4 35 3.5%
 
5 11 1.1%
 
6 56 5.6%
 
ValueCountFrequency (%) 
916 3 0.3%
 
912 4 0.4%
 
910 3 0.3%
 
751 5 0.5%
 
744 1 0.1%
 

Richtung
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
522
1
478
ValueCountFrequency (%) 
2 522 52.2%
 
1 478 47.8%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

Anzahl_Haltestellen
Real number (ℝ≥0)

Distinct count32
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.847
Minimum1
Maximum32
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum1
5-th percentile2
Q15
median11
Q317
95-th percentile26
Maximum32
Range31
Interquartile range (IQR)12

Descriptive statistics

Standard deviation7.69446527
Coefficient of variation (CV)0.6494863907
Kurtosis-0.5655770748
Mean11.847
Median Absolute Deviation (MAD)6.382924
Skewness0.5558555147
Sum11847
Variance59.2047958
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 14.5 21.5 26.5 32. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 58 5.8%
 
2 57 5.7%
 
8 56 5.6%
 
10 54 5.4%
 
5 53 5.3%
 
9 49 4.9%
 
11 49 4.9%
 
1 47 4.7%
 
6 46 4.6%
 
13 45 4.5%
 
Other values (22) 486 48.6%
 
ValueCountFrequency (%) 
1 47 4.7%
 
2 57 5.7%
 
3 41 4.1%
 
4 58 5.8%
 
5 53 5.3%
 
ValueCountFrequency (%) 
32 1 0.1%
 
31 7 0.7%
 
30 9 0.9%
 
29 10 1.0%
 
28 10 1.0%
 

Haltestelle_Id
Unsupported

REJECTED
UNSUPPORTED
Missing0
Missing (%)0.0%
Memory size7.9 KiB

Nachste_Haltestelle_Id
Unsupported

REJECTED
UNSUPPORTED
Missing0
Missing (%)0.0%
Memory size7.9 KiB

Uhrzeit
Date

MISSING
Distinct count933
Unique (%)96.1%
Missing29
Missing (%)2.9%
Memory size7.9 KiB
Minimum1900-01-01 04:54:24
Maximum1900-01-01 23:59:48
Histogram

Anzahl_Messungen
Real number (ℝ≥0)

MISSING
Distinct count77
Unique (%)8.7%
Missing111
Missing (%)11.1%
Infinite0
Infinite (%)0.0%
Mean13.28121485
Minimum1
Maximum116
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median8
Q315
95-th percentile46.6
Maximum116
Range115
Interquartile range (IQR)11

Descriptive statistics

Standard deviation16.16105931
Coefficient of variation (CV)1.216835922
Kurtosis8.130025064
Mean13.28121485
Median Absolute Deviation (MAD)10.71732878
Skewness2.620768429
Sum11807
Variance261.1798382
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 80 8.0%
 
2 76 7.6%
 
4 64 6.4%
 
3 59 5.9%
 
6 58 5.8%
 
5 49 4.9%
 
7 44 4.4%
 
10 44 4.4%
 
8 38 3.8%
 
9 37 3.7%
 
Other values (67) 340 34.0%
 
(Missing) 111 11.1%
 
ValueCountFrequency (%) 
1 80 8.0%
 
2 76 7.6%
 
3 59 5.9%
 
4 64 6.4%
 
5 49 4.9%
 
ValueCountFrequency (%) 
116 1 0.1%
 
102 1 0.1%
 
97 1 0.1%
 
91 1 0.1%
 
89 1 0.1%
 

Einsteiger
Real number (ℝ≥0)

MISSING
Distinct count431
Unique (%)47.5%
Missing93
Missing (%)9.3%
Infinite0
Infinite (%)0.0%
Mean5.610423565
Minimum0.009999999776
Maximum75.37000275
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.009999999776
5-th percentile0.2300000042
Q11
median2.779999971
Q36.670000076
95-th percentile21.13399887
Maximum75.37000275
Range75.36000275
Interquartile range (IQR)5.670000076

Descriptive statistics

Standard deviation7.625798702
Coefficient of variation (CV)1.359219783
Kurtosis13.60228825
Mean5.610423565
Median Absolute Deviation (MAD)5.090398312
Skewness3.01630497
Sum5088.649902
Variance58.15280533
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 46 4.6%
 
2 29 2.9%
 
0.5 21 2.1%
 
4 20 2.0%
 
5 19 1.9%
 
3.5 16 1.6%
 
3 11 1.1%
 
1.330000043 11 1.1%
 
1.5 10 1.0%
 
0.6600000262 10 1.0%
 
Other values (421) 714 71.4%
 
(Missing) 93 9.3%
 
ValueCountFrequency (%) 
0.009999999776 2 0.2%
 
0.02999999933 1 0.1%
 
0.05000000075 3 0.3%
 
0.05999999866 3 0.3%
 
0.0700000003 2 0.2%
 
ValueCountFrequency (%) 
75.37000275 1 0.1%
 
49.91999817 1 0.1%
 
48.20000076 1 0.1%
 
41.33000183 1 0.1%
 
40.5 1 0.1%
 

Aussteiger
Real number (ℝ≥0)

MISSING
Distinct count406
Unique (%)46.8%
Missing132
Missing (%)13.2%
Infinite0
Infinite (%)0.0%
Mean5.168573856
Minimum0.009999999776
Maximum80.97000122
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.009999999776
5-th percentile0.1934999973
Q11
median2.644999981
Q36.882499695
95-th percentile18
Maximum80.97000122
Range80.96000122
Interquartile range (IQR)5.882499695

Descriptive statistics

Standard deviation6.763618469
Coefficient of variation (CV)1.308604396
Kurtosis23.90267372
Mean5.168573856
Median Absolute Deviation (MAD)4.578095436
Skewness3.538320065
Sum4486.320312
Variance45.74653244
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 36 3.6%
 
2 33 3.3%
 
0.5 23 2.3%
 
4 19 1.9%
 
1.5 13 1.3%
 
3 13 1.3%
 
6 12 1.2%
 
0.6600000262 12 1.2%
 
0.3300000131 11 1.1%
 
1.25 10 1.0%
 
Other values (396) 686 68.6%
 
(Missing) 132 13.2%
 
ValueCountFrequency (%) 
0.009999999776 2 0.2%
 
0.01999999955 1 0.1%
 
0.02999999933 1 0.1%
 
0.05000000075 4 0.4%
 
0.05999999866 4 0.4%
 
ValueCountFrequency (%) 
80.97000122 1 0.1%
 
54.36999893 1 0.1%
 
42 1 0.1%
 
38.59999847 1 0.1%
 
34 1 0.1%
 

Besetzung
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count712
Unique (%)71.4%
Missing3
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean27.33268547
Minimum0.09000000358
Maximum209
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.09000000358
5-th percentile2.177999973
Q19.25
median20.25
Q337.06999969
95-th percentile74.6780014
Maximum209
Range208.91
Interquartile range (IQR)27.81999969

Descriptive statistics

Standard deviation25.69221115
Coefficient of variation (CV)0.9399812243
Kurtosis6.917271137
Mean27.33268547
Median Absolute Deviation (MAD)18.50615883
Skewness2.121451616
Sum27250.68945
Variance660.0897217
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4 11 1.1%
 
6 10 1.0%
 
2 8 0.8%
 
8 8 0.8%
 
20 8 0.8%
 
11 8 0.8%
 
9.5 7 0.7%
 
15 7 0.7%
 
28 7 0.7%
 
13 6 0.6%
 
Other values (702) 917 91.7%
 
ValueCountFrequency (%) 
0.09000000358 1 0.1%
 
0.25 1 0.1%
 
0.400000006 1 0.1%
 
0.5 3 0.3%
 
0.6100000143 1 0.1%
 
ValueCountFrequency (%) 
209 1 0.1%
 
190.6600037 1 0.1%
 
147.25 1 0.1%
 
147 1 0.1%
 
142 1 0.1%
 

Distanz
Real number (ℝ≥0)

Distinct count355
Unique (%)35.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean409.5480042
Minimum108
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum108
5-th percentile213
Q1297.75
median370
Q3464
95-th percentile683.05
Maximum2020
Range1912
Interquartile range (IQR)166.25

Descriptive statistics

Standard deviation187.7185364
Coefficient of variation (CV)0.458355393
Kurtosis14.15157413
Mean409.5480042
Median Absolute Deviation (MAD)122.5026016
Skewness2.893883705
Sum409548
Variance35238.24609
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 108. 184. 267.5 291. 297.5 ... 464.5 537. 690.5 1282.5 2020. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
280 12 1.2%
 
341 11 1.1%
 
380 10 1.0%
 
370 10 1.0%
 
320 10 1.0%
 
420 10 1.0%
 
213 10 1.0%
 
268 10 1.0%
 
272 10 1.0%
 
397 9 0.9%
 
Other values (345) 898 89.8%
 
ValueCountFrequency (%) 
108 1 0.1%
 
118 2 0.2%
 
131 1 0.1%
 
132 1 0.1%
 
135 2 0.2%
 
ValueCountFrequency (%) 
2020 1 0.1%
 
1664 1 0.1%
 
1613 1 0.1%
 
1595 1 0.1%
 
1457 1 0.1%
 

Tage_DTV
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count53
Unique (%)6.7%
Missing207
Missing (%)20.7%
Infinite0
Infinite (%)0.0%
Mean81.31509399
Minimum10.14999962
Maximum251
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum10.14999962
5-th percentile14.71000004
Q137.27999878
median52
Q387
95-th percentile251
Maximum251
Range240.8500004
Interquartile range (IQR)49.72000122

Descriptive statistics

Standard deviation69.93824005
Coefficient of variation (CV)0.8600892727
Kurtosis0.3493234217
Mean81.31509399
Median Absolute Deviation (MAD)55.41399384
Skewness1.31481123
Sum64482.85938
Variance4891.357422
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
52 107 10.7%
 
62 95 9.5%
 
48 90 9.0%
 
203 76 7.6%
 
251 50 5.0%
 
27.69000053 23 2.3%
 
87 19 1.9%
 
14.71000004 19 1.9%
 
37.27999878 17 1.7%
 
77.56999969 16 1.6%
 
Other values (43) 281 28.1%
 
(Missing) 207 20.7%
 
ValueCountFrequency (%) 
10.14999962 4 0.4%
 
10.78999996 6 0.6%
 
13.72000027 2 0.2%
 
13.77000046 7 0.7%
 
13.84000015 9 0.9%
 
ValueCountFrequency (%) 
251 50 5.0%
 
203 76 7.6%
 
183 3 0.3%
 
177.1699982 14 1.4%
 
168.3099976 3 0.3%
 

Tage_DWV
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count32
Unique (%)7.2%
Missing553
Missing (%)55.3%
Infinite0
Infinite (%)0.0%
Mean109.5014038
Minimum10.14999962
Maximum251
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum10.14999962
5-th percentile18.29000092
Q148
median77.56999969
Q3203
95-th percentile251
Maximum251
Range240.8500004
Interquartile range (IQR)155

Descriptive statistics

Standard deviation81.57498169
Coefficient of variation (CV)0.7449674511
Kurtosis-1.30383563
Mean109.5014038
Median Absolute Deviation (MAD)73.60659027
Skewness0.5069775581
Sum48947.125
Variance6654.478027
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
48 88 8.8%
 
203 76 7.6%
 
251 50 5.0%
 
27.69000053 23 2.3%
 
87 19 1.9%
 
77.56999969 16 1.6%
 
24.70000076 15 1.5%
 
177.1699982 14 1.4%
 
18.29000092 13 1.3%
 
25 13 1.3%
 
Other values (22) 120 12.0%
 
(Missing) 553 55.3%
 
ValueCountFrequency (%) 
10.14999962 4 0.4%
 
13.72000027 2 0.2%
 
13.84000015 9 0.9%
 
18.29000092 13 1.3%
 
20.29999924 6 0.6%
 
ValueCountFrequency (%) 
251 50 5.0%
 
203 76 7.6%
 
183 3 0.3%
 
177.1699982 14 1.4%
 
168.3099976 3 0.3%
 

Tage_SA
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count12
Unique (%)5.9%
Missing797
Missing (%)79.7%
Infinite0
Infinite (%)0.0%
Mean38.81355286
Minimum10.78999996
Maximum52
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum10.78999996
5-th percentile14.71000004
Q122.55999947
median41.20000076
Q352
95-th percentile52
Maximum52
Range41.21000004
Interquartile range (IQR)29.44000053

Descriptive statistics

Standard deviation14.61869144
Coefficient of variation (CV)0.3766388379
Kurtosis-1.273086548
Mean38.81355286
Median Absolute Deviation (MAD)13.07326984
Skewness-0.5355802178
Sum7879.149902
Variance213.7061462
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
52 99 9.9%
 
14.71000004 19 1.9%
 
37.27999878 17 1.7%
 
20.60000038 15 1.5%
 
31.38999939 9 0.9%
 
41.20000076 9 0.9%
 
22.55999947 8 0.8%
 
29.43000031 7 0.7%
 
10.78999996 6 0.6%
 
30.40999985 6 0.6%
 
Other values (2) 8 0.8%
 
(Missing) 797 79.7%
 
ValueCountFrequency (%) 
10.78999996 6 0.6%
 
14.71000004 19 1.9%
 
20.60000038 15 1.5%
 
21.57999992 5 0.5%
 
22.55999947 8 0.8%
 
ValueCountFrequency (%) 
52 99 9.9%
 
41.20000076 9 0.9%
 
38 3 0.3%
 
37.27999878 17 1.7%
 
31.38999939 9 0.9%
 

Tage_SO
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count10
Unique (%)7.4%
Missing865
Missing (%)86.5%
Infinite0
Infinite (%)0.0%
Mean53.63392639
Minimum13.77000046
Maximum62
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum13.77000046
5-th percentile15.14900017
Q148.22000122
median62
Q362
95-th percentile62
Maximum62
Range48.22999954
Interquartile range (IQR)13.77999878

Descriptive statistics

Standard deviation15.14799213
Coefficient of variation (CV)0.2824330261
Kurtosis1.601834059
Mean53.63392639
Median Absolute Deviation (MAD)11.77448273
Skewness-1.720984936
Sum7240.580078
Variance229.4616699
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
62 95 9.5%
 
48.22000122 11 1.1%
 
13.77000046 7 0.7%
 
40.34000015 5 0.5%
 
21.64999962 4 0.4%
 
46.25 4 0.4%
 
15.73999977 3 0.3%
 
22.62999916 3 0.3%
 
48 2 0.2%
 
39.36000061 1 0.1%
 
(Missing) 865 86.5%
 
ValueCountFrequency (%) 
13.77000046 7 0.7%
 
15.73999977 3 0.3%
 
21.64999962 4 0.4%
 
22.62999916 3 0.3%
 
39.36000061 1 0.1%
 
ValueCountFrequency (%) 
62 95 9.5%
 
48.22000122 11 1.1%
 
48 2 0.2%
 
46.25 4 0.4%
 
40.34000015 5 0.5%
 

Haltestelle
Categorical

HIGH CARDINALITY
MISSING
Distinct count366
Unique (%)37.5%
Missing23
Missing (%)2.3%
Memory size7.9 KiB
Zürich, Kantonalbank
 
16
Zürich, Limmatplatz
 
15
Zürich, Bellevue
 
15
Zürich, Paradeplatz
 
15
Zürich, Albisriederplatz
 
11
Other values (361)
905
ValueCountFrequency (%) 
Zürich, Kantonalbank 16 1.6%
 
Zürich, Limmatplatz 15 1.5%
 
Zürich, Bellevue 15 1.5%
 
Zürich, Paradeplatz 15 1.5%
 
Zürich, Albisriederplatz 11 1.1%
 
Zürich, Stauffacher 11 1.1%
 
Zürich, Stockerstrasse 10 1.0%
 
Zürich, Haldenegg 10 1.0%
 
Zürich, Bucheggplatz 9 0.9%
 
Zürich, Bändliweg 8 0.8%
 
Other values (356) 857 85.7%
 
(Missing) 23 2.3%
 

Length

Max length30
Mean length20.503
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 29 48.3%
 
Uppercase_Letter 24 40.0%
 
Other_Punctuation 3 5.0%
 
Dash_Punctuation 1 1.7%
 
Open_Punctuation 1 1.7%
 
Close_Punctuation 1 1.7%
 
Space_Separator 1 1.7%
 
ValueCountFrequency (%) 
Latin 53 88.3%
 
Common 7 11.7%
 
ValueCountFrequency (%) 
ASCII 57 100.0%
 

Haltestelle_Nummer
Real number (ℝ≥0)

MISSING
Distinct count366
Unique (%)37.5%
Missing23
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean2001.911975
Minimum7
Maximum12543
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum7
5-th percentile439
Q11076
median1612
Q32596
95-th percentile4031.4
Maximum12543
Range12536
Interquartile range (IQR)1520

Descriptive statistics

Standard deviation1579.662272
Coefficient of variation (CV)0.7890767883
Kurtosis14.02993666
Mean2001.911975
Median Absolute Deviation (MAD)1046.198686
Skewness2.925086401
Sum1955868
Variance2495332.894
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
601 16 1.6%
 
1861 15 1.5%
 
440 15 1.5%
 
1557 15 1.5%
 
48 11 1.1%
 
2556 11 1.1%
 
1100 10 1.0%
 
2582 10 1.0%
 
564 9 0.9%
 
619 8 0.8%
 
Other values (356) 857 85.7%
 
(Missing) 23 2.3%
 
ValueCountFrequency (%) 
7 1 0.1%
 
46 1 0.1%
 
47 2 0.2%
 
48 11 1.1%
 
67 1 0.1%
 
ValueCountFrequency (%) 
12543 1 0.1%
 
12509 1 0.1%
 
12506 1 0.1%
 
12504 1 0.1%
 
12500 1 0.1%
 

Nachste_Haltestelle
Categorical

HIGH CARDINALITY
MISSING
Distinct count373
Unique (%)38.2%
Missing23
Missing (%)2.3%
Memory size7.9 KiB
Zürich, Central
 
17
Zürich, Bürkliplatz
 
14
Zürich, Paradeplatz
 
12
Zürich, Helvetiaplatz
 
11
Zürich, Museum für Gestaltung
 
11
Other values (368)
912
ValueCountFrequency (%) 
Zürich, Central 17 1.7%
 
Zürich, Bürkliplatz 14 1.4%
 
Zürich, Paradeplatz 12 1.2%
 
Zürich, Helvetiaplatz 11 1.1%
 
Zürich, Museum für Gestaltung 11 1.1%
 
Zürich, Escher-Wyss-Platz 10 1.0%
 
Schlieren, Mülligen 9 0.9%
 
Zürich, Kreuzplatz 9 0.9%
 
Zürich, Sternen Oerlikon 9 0.9%
 
Zürich, Bellevue 8 0.8%
 
Other values (363) 867 86.7%
 
(Missing) 23 2.3%
 

Length

Max length30
Mean length20.47
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 29 50.9%
 
Uppercase_Letter 23 40.4%
 
Other_Punctuation 3 5.3%
 
Space_Separator 1 1.8%
 
Dash_Punctuation 1 1.8%
 
ValueCountFrequency (%) 
Latin 52 91.2%
 
Common 5 8.8%
 
ValueCountFrequency (%) 
ASCII 54 100.0%
 

Nachste_Haltestelle_Nummer
Real number (ℝ≥0)

MISSING
Distinct count373
Unique (%)38.2%
Missing23
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean2039.640737
Minimum4
Maximum12826
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum4
5-th percentile435
Q11041
median1690
Q32666
95-th percentile4057.8
Maximum12826
Range12822
Interquartile range (IQR)1625

Descriptive statistics

Standard deviation1586.547229
Coefficient of variation (CV)0.7778562176
Kurtosis13.64196926
Mean2039.640737
Median Absolute Deviation (MAD)1047.565635
Skewness2.866873536
Sum1992729
Variance2517132.11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
619 17 1.7%
 
615 14 1.4%
 
1861 12 1.2%
 
1158 11 1.1%
 
1667 11 1.1%
 
799 10 1.0%
 
1453 9 0.9%
 
1690 9 0.9%
 
2572 9 0.9%
 
3019 8 0.8%
 
Other values (363) 867 86.7%
 
(Missing) 23 2.3%
 
ValueCountFrequency (%) 
4 3 0.3%
 
7 2 0.2%
 
46 2 0.2%
 
47 3 0.3%
 
48 7 0.7%
 
ValueCountFrequency (%) 
12826 1 0.1%
 
12509 1 0.1%
 
12508 2 0.2%
 
12500 1 0.1%
 
12488 1 0.1%
 

Durchschnitt_Tag
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count769
Unique (%)97.2%
Missing209
Missing (%)20.9%
Infinite0
Infinite (%)0.0%
Mean2003.029663
Minimum5.370299816
Maximum28468.7207
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum5.370299816
5-th percentile97.93984985
Q1424.4949951
median969.9749756
Q32141.616699
95-th percentile7488.740234
Maximum28468.7207
Range28463.3504
Interquartile range (IQR)1717.121704

Descriptive statistics

Standard deviation3153.686523
Coefficient of variation (CV)1.574458223
Kurtosis23.16275787
Mean2003.029663
Median Absolute Deviation (MAD)1832.152466
Skewness4.156374931
Sum1584397.125
Variance9945739
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
496 2 0.2%
 
446.5999756 2 0.2%
 
216.3000031 2 0.2%
 
322.3999939 2 0.2%
 
981.2400513 2 0.2%
 
1224 2 0.2%
 
676 2 0.2%
 
589 2 0.2%
 
525 2 0.2%
 
1584 2 0.2%
 
Other values (759) 771 77.1%
 
(Missing) 209 20.9%
 
ValueCountFrequency (%) 
5.370299816 1 0.1%
 
5.394999981 1 0.1%
 
14.71000004 1 0.1%
 
14.71500015 1 0.1%
 
15.5 1 0.1%
 
ValueCountFrequency (%) 
28468.7207 1 0.1%
 
26294.58984 1 0.1%
 
25795.20898 1 0.1%
 
24833.94141 1 0.1%
 
21836.70898 1 0.1%
 

Durchschnitt_Wochentag
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count437
Unique (%)98.0%
Missing554
Missing (%)55.4%
Infinite0
Infinite (%)0.0%
Mean2919.291016
Minimum5.370299816
Maximum28468.7207
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum5.370299816
5-th percentile192.7149963
Q1736.3707275
median1558.73999
Q33545.527588
95-th percentile10602.69043
Maximum28468.7207
Range28463.3504
Interquartile range (IQR)2809.15686

Descriptive statistics

Standard deviation3906.364746
Coefficient of variation (CV)1.338121046
Kurtosis13.43975353
Mean2919.291016
Median Absolute Deviation (MAD)2505.747314
Skewness3.210196495
Sum1302004.25
Variance15259686
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
384 2 0.2%
 
1423.679932 2 0.2%
 
7056.279785 2 0.2%
 
1224 2 0.2%
 
1656 2 0.2%
 
1584 2 0.2%
 
446.5999756 2 0.2%
 
525 2 0.2%
 
816 2 0.2%
 
1712.624023 1 0.1%
 
Other values (427) 427 42.7%
 
(Missing) 554 55.4%
 
ValueCountFrequency (%) 
5.370299816 1 0.1%
 
34.61249924 1 0.1%
 
37.05000305 1 0.1%
 
46.40000153 1 0.1%
 
54.87000275 1 0.1%
 
ValueCountFrequency (%) 
28468.7207 1 0.1%
 
26294.58984 1 0.1%
 
25795.20898 1 0.1%
 
24833.94141 1 0.1%
 
21836.70898 1 0.1%
 

Durchschnitt_Samstag
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count197
Unique (%)97.0%
Missing797
Missing (%)79.7%
Infinite0
Infinite (%)0.0%
Mean827.0939331
Minimum5.394999981
Maximum3856.320312
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum5.394999981
5-th percentile45.80799866
Q1221.0850067
median562.585022
Q31184.727539
95-th percentile2625.999756
Maximum3856.320312
Range3850.925313
Interquartile range (IQR)963.6425323

Descriptive statistics

Standard deviation814.5111084
Coefficient of variation (CV)0.9847867041
Kurtosis2.460825682
Mean827.0939331
Median Absolute Deviation (MAD)624.994751
Skewness1.585248351
Sum167900.0625
Variance663428.375
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
216.3000031 2 0.2%
 
329.1600037 2 0.2%
 
824 2 0.2%
 
1456 2 0.2%
 
981.2400513 2 0.2%
 
208 2 0.2%
 
570 1 0.1%
 
3217.23999 1 0.1%
 
1404 1 0.1%
 
554.3474121 1 0.1%
 
Other values (187) 187 18.7%
 
(Missing) 797 79.7%
 
ValueCountFrequency (%) 
5.394999981 1 0.1%
 
14.71000004 1 0.1%
 
14.71500015 1 0.1%
 
24.71279907 1 0.1%
 
26.91930008 1 0.1%
 
ValueCountFrequency (%) 
3856.320312 1 0.1%
 
3762.719971 1 0.1%
 
3744 1 0.1%
 
3553.160156 1 0.1%
 
3217.23999 1 0.1%
 

Durchschnitt_Sonntag
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count129
Unique (%)96.3%
Missing866
Missing (%)86.6%
Infinite0
Infinite (%)0.0%
Mean823.1079712
Minimum15.5
Maximum4053.559814
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum15.5
5-th percentile87.92115021
Q1313.2050171
median620.9300537
Q31156.765015
95-th percentile2437.778076
Maximum4053.559814
Range4038.059814
Interquartile range (IQR)843.5599976

Descriptive statistics

Standard deviation739.3115845
Coefficient of variation (CV)0.8981951461
Kurtosis3.058235645
Mean823.1079712
Median Absolute Deviation (MAD)553.1356201
Skewness1.625657439
Sum110296.4609
Variance546581.625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
496 2 0.2%
 
93 2 0.2%
 
310 2 0.2%
 
322.3999939 2 0.2%
 
589 2 0.2%
 
110.1600037 1 0.1%
 
194.8499908 1 0.1%
 
4053.559814 1 0.1%
 
899 1 0.1%
 
1194.73999 1 0.1%
 
Other values (119) 119 11.9%
 
(Missing) 866 86.6%
 
ValueCountFrequency (%) 
15.5 1 0.1%
 
37.61159897 1 0.1%
 
46.81800461 1 0.1%
 
47.09340286 1 0.1%
 
55.08000183 1 0.1%
 
ValueCountFrequency (%) 
4053.559814 1 0.1%
 
3241.359863 1 0.1%
 
2892.919922 1 0.1%
 
2745.359863 1 0.1%
 
2628.800049 1 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexLinieRichtungAnzahl_HaltestellenHaltestelle_IdNachste_Haltestelle_IdUhrzeitAnzahl_MessungenEinsteigerAussteigerBesetzungDistanzTage_DTVTage_DWVTage_SATage_SOHaltestelleHaltestelle_NummerNachste_HaltestelleNachste_Haltestelle_NummerDurchschnitt_TagDurchschnitt_WochentagDurchschnitt_SamstagDurchschnitt_Sonntag
01865113194414401900-01-01 15:12:24NaN5.752.7539.500000450.0NaNNaNNaNNaNZürich, Waidfussweg2800.0Zürich, Wipkingerplatz2937.0NaNNaNNaNNaN
16008017115865851900-01-01 19:09:302.01.50NaN1.500000299.052.000000NaN52.0NaNZürich, Werdhölzli2872.0Zürich, Bändliweg592.078.000000NaN78.000000NaN
24627221241682161900-01-01 14:38:063.015.3320.3349.330002250.013.84000013.840000NaNNaNZürich, Bellevue440.0Zürich, Opernhaus1845.0682.727234682.727234NaNNaN
333646311281181191900-01-01 17:55:482.0NaN7.5027.500000320.020.29999920.299999NaNNaNZürich, Berghaldenstrasse455.0Zürich, Loorenstrasse1584.0558.250000558.250000NaNNaN
497055141231851861900-01-01 08:39:4212.03.333.5014.500000339.062.000000NaNNaN62.0Zürich, Goldbrunnenplatz1012.0Zürich, Talwiesenstrasse2638.0899.000000NaNNaN899.0
55073821241682161900-01-01 13:15:361.027.0025.0062.000000250.0NaNNaNNaNNaNZürich, Bellevue440.0Zürich, Opernhaus1845.0NaNNaNNaNNaN
665777171131781771900-01-01 11:23:485.011.0012.2044.599998367.048.00000048.000000NaNNaNZürich, Limmatplatz1557.0Zürich, Museum für Gestaltung1667.02140.7998052140.799805NaNNaN
74435870236016001900-01-01 18:12:3610.01.00NaN10.900000310.052.000000NaN52.0NaNZürich, Im Hüsli1310.0Zürich, Marbachweg1615.0566.799988NaN566.799988NaN
85326680271431421900-01-01 19:29:4225.02.481.6013.600000430.0203.000000203.000000NaNNaNZürich, Untermoosstrasse2754.0Zürich, Rautistrasse2034.02760.8000492760.800049NaNNaN
96594178135405411900-01-01 16:25:0615.05.460.2011.130000438.052.000000NaN52.0NaNZürich, Rautihalde2033.0Zürich, Schulhaus Buchlern2276.0578.760010NaN578.760010NaN

Last rows

df_indexLinieRichtungAnzahl_HaltestellenHaltestelle_IdNachste_Haltestelle_IdUhrzeitAnzahl_MessungenEinsteigerAussteigerBesetzungDistanzTage_DTVTage_DWVTage_SATage_SOHaltestelleHaltestelle_NummerNachste_HaltestelleNachste_Haltestelle_NummerDurchschnitt_TagDurchschnitt_WochentagDurchschnitt_SamstagDurchschnitt_Sonntag
9908962331211211201900-01-01 05:37:006.03.16NaN3.160000299.027.69000127.690001NaNNaNZürich, Kienastenwies1389.0Zürich, Zweiackerstrasse3011.087.50040487.500404NaNNaN
9917329418241731751900-01-01 17:54:5423.03.3913.91120.040001806.048.00000048.000000NaNNaNZürich, Balgrist421.0Zürich, Rehalp2549.05761.9199225761.919922NaNNaN
99298234311171051061900-01-01 18:43:542.01.000.5035.500000274.029.430000NaN29.430000NaNZürich, Neumarkt1724.0Zürich, Kunsthaus1472.01044.765015NaN1044.765015NaN
9933818305197657661900-01-01 08:33:4810.00.100.602.700000638.0251.000000251.000000NaNNaNBergdietikon, Vorbühl2789.0Kindhausen AG, Eichholz757.0677.700012677.700012NaNNaN
994204554252092101900-01-01 05:23:3647.00.100.121.820000282.0203.000000203.000000NaNNaNZürich, Sportweg2449.0Zürich, Aargauerstrasse7.0369.460022369.460022NaNNaN
99519245321216076081900-01-01 14:25:5416.02.873.8121.620001413.077.57000077.570000NaNNaNZürich, Höfliweg1272.0Zürich, Friesenberg902.01677.0634771677.063477NaNNaN
996536912114812291900-01-01 23:59:481.0NaNNaN2.000000312.0NaNNaNNaNNaNZürich, Albisriederplatz48.0Zürich, Zypressenstrasse3019.0NaNNaNNaNNaN
997254293312087741900-01-01 22:19:127.00.140.7115.000000243.020.29999920.299999NaNNaNZürich, Nürenbergstrasse1753.0Zürich, Bahnhof Wipkingen3043.0304.500000304.500000NaNNaN
998397917511249501900-01-01 16:13:009.01.772.7726.440001432.031.389999NaN31.389999NaNZürich, Bollingerweg3789.0Zürich, Max-Bill-Platz3764.0829.951599NaN829.951599NaN
99966557743224774761900-01-01 09:00:5415.01.00NaN1.730000393.062.000000NaNNaN62.0Maur, Kirche1403.0Maur, Dorf678.0107.260002NaNNaN107.260002